Skip to content

Conversation

jladd-geant
Copy link
Contributor

@jladd-geant jladd-geant commented Jun 18, 2025

Scope and purpose

Fixes #1502.

Depends on #1506.

Adds a new INCIDENT_RESTART event to allow source systems to reopen incidents.
Described further in linked issue.

This pull request

  • changes the database

Contributor Checklist

Every pull request should have this checklist filled out, no matter how small it is.
More information about contributing to Argus can be found in the
Development docs.

  • Added a changelog fragment for towncrier

  • Added/amended tests for new/changed code
    Doesn't break any tests, doesn't introduce any untested lines of code, and is implemented in the same way as the existing REOPEN event. So adding/changing tests didn't seem necessary, happy to do otherwise if needed.

  • Added/changed documentation, including updates to the user manual if feature flow or UI is considerably changed

  • Linted/formatted the code with ruff and djLint, easiest by using pre-commit

  • The first line of the commit message continues the sentence "If applied, this commit will ...", starts with a capital letter, does not end with punctuation and is 50 characters or less long. See our how-to

  • If this results in changes to the database model: Updated the ER diagram
    There are no changes to the ER diagram as a result of adding a new event type

@codecov-commenter
Copy link

codecov-commenter commented Jun 18, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.46%. Comparing base (ae5eb17) to head (6abdd2a).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1504   +/-   ##
=======================================
  Coverage   79.46%   79.46%           
=======================================
  Files         121      121           
  Lines        5409     5410    +1     
=======================================
+ Hits         4298     4299    +1     
  Misses       1111     1111           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jladd-geant
Copy link
Contributor Author

Only thing im unsure of is the migration generated seems to be duplicating changes? It seems to be converting the id of some fields to BigAutoField despite that already being in the previous squashed migration? I cant imagine it would cause any issues but not really sure whats going on with that.
Let me know if I should make any changes.

@jladd-geant jladd-geant marked this pull request as ready for review June 18, 2025 15:43
@elfjes
Copy link
Contributor

elfjes commented Jun 19, 2025

Code looks good to me :) That migration is a bit strange yeah, maybe @hmpf can shed some light on what's happening there

Can you add one or more tests?

  • RES should reopen a closed incident
  • only source systems can restart incidents
  • Defined behaviour on RES an already open incident?

@johannaengland johannaengland self-requested a review June 19, 2025 06:35
@johannaengland johannaengland added data model Affects the data model and/or SQL schema geant Needed by or originated from geant-argus data migration Batch change to data in the database labels Jun 19, 2025
@hmpf
Copy link
Contributor

hmpf commented Jun 19, 2025

Agree about the migrations, the change to BigAutoField should already have happened . On Tuesday I hope to be able to check this properly.

Do python manage.py dbshell against prod and dev database. In dbshell run select * from django_migrations where app = 'argus_incident';.

You should have this line (first column may be different, last column deffo different)

 89 | argus_incident | 0009_use_bigautofield_on_quickly_growing_tables | 2025-05-16 08:23:36.499823+02

If not in prod db I'll see if I can do some experiments (with your help) next week, we might need a release that canonicalizes the squashed migrations (=deletes the old ones, needs manual rewrite of the new migrations). If not in dev: nuke the dev database and migrate to version 1.37.0 and check again before you migrate to 2.0.0.

Also do \d argus_incident_incident and check that the Type of the id field is already bigint:

       Column       |           Type           | Collation | Nullable |                       Default                       
--------------------+--------------------------+-----------+----------+-----------------------------------------------------
 id                 | bigint                   |           | not null | nextval('argus_incident_incident_id_seq'::regclass)

Copy link
Contributor

@johannaengland johannaengland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree both with @hmpf and @elfjes, looking into the migration problem is important and I would like to see more tests.

And I have a few additional comments, because currently a source system can restart an incident, but not end it again

@johannaengland
Copy link
Contributor

Can you add one or more tests?

I would also like to see a test for restarting an ended incident and then ending it again.

* Defined behaviour on RES an already open incident?

This should simply post the event without changing anything about the incident itself.

@jladd-geant
Copy link
Contributor Author

jladd-geant commented Jun 20, 2025

Agree about the migrations, the change to BigAutoField should already have happened . On Tuesday I hope to be able to check this properly.

Do python manage.py dbshell against prod and dev database. In dbshell run select * from django_migrations where app = 'argus_incident';.

You should have this line (first column may be different, last column deffo different)

 89 | argus_incident | 0009_use_bigautofield_on_quickly_growing_tables | 2025-05-16 08:23:36.499823+02

If not in prod db I'll see if I can do some experiments (with your help) next week, we might need a release that canonicalizes the squashed migrations (=deletes the old ones, needs manual rewrite of the new migrations). If not in dev: nuke the dev database and migrate to version 1.37.0 and check again before you migrate to 2.0.0.

Also do \d argus_incident_incident and check that the Type of the id field is already bigint:

       Column       |           Type           | Collation | Nullable |                       Default                       
--------------------+--------------------------+-----------+----------+-----------------------------------------------------
 id                 | bigint                   |           | not null | nextval('argus_incident_incident_id_seq'::regclass)

Ok so have investigated this some more and it actually seems to be a small issue with the squashed migration. Firstly if I just checkout to master with a blank db and make migrations it makes the same changes for the ID's (except what I added of course), i assume there should be no detected changes. What I think has happened is that originally the ID fields were being autocreated (not defined in model) but then explicitly defined as bigints in 10bc007. So in migration 0001_initial it does fields=[("id", models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name="ID")) but after being explicitly defined as bigint the alterfield in 0009 does field=models.BigAutoField(primary_key=True, serialize=False),. But in the squashed migration its treating the id's as autocreated even though they are explicitly defined in the model to be bigints. Im not sure how the squashed migration is made but changing

('id', models.BigAutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),

to

('id', models.BigAutoField(primary_key=True, serialize=False),

in the squashed migration stops makemigrations from detecting changes.
DEFAULT_AUTO_FIELD is just AutoField in base.py so doesnt seem to make sense for a BigAutoField to be autocreated.
I suppose this could somehow be an issue on my end or im misunderstanding things haha but this is what I think is going on. I suppose this would need a separate PR to fix the squashed migration?

@hmpf hmpf added the blocked Another thing/issue has to be resolved before tackling this label Jun 24, 2025
@jladd-geant jladd-geant force-pushed the incident-restart-event branch from 1af8625 to 6b649a9 Compare June 25, 2025 10:36
@jladd-geant
Copy link
Contributor Author

jladd-geant commented Jun 25, 2025

Ok have now

  • Fixed the issues with the code
  • Made a unit test to restart a closed incident then close it again
  • Fixed the migration
  • Small change to docs

Let me know if you want more tests/changes to what I made, the test does have very similar functionality to the test for reopening i'm not sure if that should be combined. fixed, seems ok now

@jladd-geant jladd-geant requested a review from hmpf June 25, 2025 16:36
Copy link

@johannaengland johannaengland removed the blocked Another thing/issue has to be resolved before tackling this label Jun 27, 2025
Copy link
Contributor

@johannaengland johannaengland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpicks about naming/function parameters

Comment on lines +123 to +125
reopen_event_dict = self._create_event_dict(Event.Type.INCIDENT_RESTART)
response = self.source1_rest_client.post(self.events_url(self.stateful_incident1), reopen_event_dict)
self.assertEqual(parse_datetime(response.data["timestamp"]), reopen_event_dict["timestamp"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reopen_event_dict = self._create_event_dict(Event.Type.INCIDENT_RESTART)
response = self.source1_rest_client.post(self.events_url(self.stateful_incident1), reopen_event_dict)
self.assertEqual(parse_datetime(response.data["timestamp"]), reopen_event_dict["timestamp"])
restart_event_dict = self._create_event_dict(Event.Type.INCIDENT_RESTART)
response = self.source1_rest_client.post(self.events_url(self.stateful_incident1), restart_event_dict)
self.assertEqual(parse_datetime(response.data["timestamp"]), restart_event_dict["timestamp"])

self.assertTrue(self.stateful_incident1.events.filter(pk=response.data["pk"]).exists())
self.assertEqual(response.data["incident"], self.stateful_incident1.pk)

def _assert_incident_is_closed_at_timestamp(self, end_event_timestamp):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to have the incident as a parameter as well instead of assuming it is self.stateful_incident1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data migration Batch change to data in the database data model Affects the data model and/or SQL schema geant Needed by or originated from geant-argus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow source sytems to reopen incidents

5 participants